Overlap versus Imbalance
نویسندگان
چکیده
In this paper we give a systematic analysis of the relationship between imbalance and overlap as factors influencing classifier performance. We demonstrate that these two factors have interdependent effects and that we cannot form a full understanding of their effects by considering them only in isolation. Although the imbalance problem can be considered a symptom of the small disjuncts problem which is solved by using larger training sets, the overlap problem is of a fundamentally different character and the performance of learned classifiers can actually be made worse by using more training data when overlap is present. We also examine the effects of overlap and imbalance on the complexity of the learned model and demonstrate that overlap is a far more serious factor than imbalance in this respect.
منابع مشابه
Combined Effects of Class Imbalance and Class Overlap on Instance-Based Classification
In real-world applications, it has been often observed that class imbalance (significant differences in class prior probabilities) may produce an important deterioration of the classifier performance, in particular with patterns belonging to the less represented classes. This effect becomes especially significant on instance-based learning due to the use of some dissimilarity measure. We analyz...
متن کاملA hybrid method to face class overlap and class imbalance on neural networks and multi-class scenarios
Class imbalance and class overlap are two of the major problems in data mining and machine learning. Several studies have shown that these data complexities may affect the performance or behavior of artificial neural networks. Strategies proposed to face with both challenges have been separately applied. In this paper, we introduce a hybrid method for handling both class imbalance and class ove...
متن کاملAn Empirical Study of the Behavior of Classifiers on Imbalanced and Overlapped Data Sets
Class imbalance has been reported as an important obstacle to apply traditional learning algorithms to real-world domains. Recent investigations have questioned whether the imbalance is the unique factor that hinders the performance of classifiers. In this paper, we study the behavior of six algorithms when classifying imbalanced, overlapped data sets under uncommon situations (e.g., when the o...
متن کاملBack Propagation with Balanced MSE Cost Function and Nearest Neighbor Editing for Handling Class Overlap and Class Imbalance
The class imbalance problem has been considered a critical factor for designing and constructing the supervised classifiers. In the case of artificial neural networks, this complexity negatively affects the generalization process on under-represented classes. However, it has also been observed that the decrease in the performance attainable of standard learners is not directly caused by the cla...
متن کاملBlind frequency offset estimation for overlap PCC-OFDM systems in presence of phase noise
This paper presents a technique for frequency offset estimation for polynomial cancellation coded orthogonal frequency division multiplexing with symbols overlapped in the time domain (overlap PCC-OFDM) in the presence of phase noise. The frequency offset estimator is designed based on the subcarrier pair imbalance (SPI) caused by frequency offset. The estimation is performed in the frequency d...
متن کامل